Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Three-dimensional spatial structured encoding deep network for RGB-D scene parsing
WANG Zeyu, WU Yanxia, ZHANG Guoyin, BU Shuhui
Journal of Computer Applications    2017, 37 (12): 3458-3466.   DOI: 10.11772/j.issn.1001-9081.2017.12.3458
Abstract595)      PDF (11074KB)(1061)       Save
Efficient feature extraction from RGB-D images and accurate 3D spatial structure learning are two key points for improving the performance of RGB-D scene parsing. Recently, Fully Convolutional Neural Network (FCNN) has powerful ability of feature extraction, however, FCNN can not learn 3D spatial structure information sufficiently. In order to solve the problem, a new neural network architecture called Three-dimensional Spatial Structured Encoding Deep Network (3D-SSEDN) was proposed. The graphical model network and spatial structured encoding algorithm were organically combined by the embedded structural learning layer, the 3D spatial distribution of objects could be precisely learned and described. Through the proposed 3D-SSEDN, not only the Hierarchical Visual Feature (HVF) and Hierarchical Depth Feature (HDF) containing hierarchical shape and depth information could be extracted, but also the spatial structure feature containing 3D structural information could be generated. Furthermore, the hybrid feature could be obtained by fusing the above three kinds of features, thus the semantic information of RGB-D images could be accurately expressed. The experimental results on the standard RGB-D datasets of NYUDv2 and SUNRGBD show that, compared with the most previous state-of-the-art scene parsing methods, the proposed 3D-SSEDN can significantly improve the performance of RGB-D scene parsing.
Reference | Related Articles | Metrics